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Abstract 



When binary linear error-correcting codes are used over symmetric channels, a relaxed ver- 
sion of the maximum likelihood decoding problem can be stated as a linear program (LP). 
This LP decoder can be used to decode at bit-error-rates comparable to state-of-the-art belief 
propagation (BP) decoders, but with significantly stronger theoretical guarantees. However, LP 
| decoding when implemented with standard LP solvers does not easily scale to the block lengths 

of modern error correcting codes. In this paper we draw on decomposition methods from op- 
timization theory, specifically the Alternating Directions Method of Multipliers (ADMM), to 
develop efficient distributed algorithms for LP decoding. The key enabling technical result is 
a nearly linear time algorithm for two-norm projection onto the parity polytope. This allows 
us to use LP decoding, with all its theoretical guarantees, to decode large-scale error correcting 
codes efficiently. 

We present numerical results for two LDPC codes. The first is the rate-0.5 [2640, 1320] 
"Margulis" code, the second a rate-0.77 [1057.244] code. The "waterfall" region of LP decoding 
is seen to initiate at a slightly higher signal-to-noise ratio than for sum-product BP, however 
an error-floor is not observed for either code, which is not the case for BP. Our implementation 
1^ ■ of LP decoding using ADMM executes as quickly as our baseline sum-product BP decoder, is 

' fully parallelizable, and can be seen to implement a type of message-passing with a particularly 

■ simple schedule. 

, Keywords. LP Decoding. Alternating Direction Method of Multipliers. 
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1 Introduction 



While the problem of error correction decoding dates back at least to Richard Hamming's seminal 
work in the 1940s [19], the idea of drawing upon techniques of convex optimization to solve such 
problems apparently dates only to Jon Feldman's 2003 Ph.D. thesis [13|I15|. Feldman and his 
collaborators showed that, for binary codes used over symmetric channels, a relaxed version of the 
maximum likelihood (ML) decoding problem can be stated as a linear program (LP). Considering 
graph-based low-density parity-check (LDPC) codes, work by Feldman et al. and later authors |42j 
|37j |49] demonstrates that the bit-error-rate performance of LP decoding is competitive with 
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that of standard sum-product (and min-sum) belief propagation (BP) decoding. Furthermore, LP 
decoding comes with a certificate of correctness (ML certificate) [15j - verifying with probability 
one when the decoder has found the ML codeword. And, if a high-quality expander [101 114] or 
high-girth [2] code is used, LP decoding is guaranteed to correct a constant number of bit flips. 

A barrier to the adoption of LP decoding is that solving Feldman's relaxation using generic LP 
algorithms is not computationally competitive with BP. This is because standard LP solvers do 
not automatically exploit the rich structure inherent to the linear program. Furthermore, unlike 
BP, standard solvers do not have a distributed nature, limiting their scalability via parallelized 
(and hardware-compatible) implementation. In this paper we draw upon large-scale decomposition 
methods from convex optimization to develop an efficient, scalable algorithm for LP decoding. The 
result is a suite of new techniques for efficient error correction of modern graph-based codes, and 
insight into the elegant geometry of a fundamental convex object of error-correction, the parity 
polytope. 

A real- world motivation for developing efficient LP decoding algorithms comes from applications 
that have extreme reliability requirements. While suitably designed LDPC codes decoded using 
BP can achieve near-Shannon performance in the "waterfall" regime where the signal-to-noise ratio 
(SNR) is close to the code's threshold, they often suffer from an "error floor" in the high SNR 
regime. This limits the use of LDPCs in applications such as magnetic recording and fiber-optic 
transport networks. Error floors result from problematic arrangements in the graphical structure of 
the code (variously termed "pseudocodewords," "near-codewords," "trapping sets," "instantons," 
"absorbing sets" [18] |22j |27j |33j [llj). from the sub-optimal BP decoding algorithm, and from 
the particulars of the implementation of BP. Two natural approaches to improving error floor 
performance are to design codes with fewer problematic arrangements [20J [39J [47J [56J [38] [15] . or 
to develop improved decoding algorithms. As LP decoders have empirically not yet been observed 
to suffer from error floors [491155]. the approach taken herein is the latter. 

A second motivation is that an efficient LP decoder can help to develop closer and closer 
approximations of ML decoders. This is due to the strong theoretical guarantees associated with 
LP solvers. When the optimum vertex identified by an LP decoder is integer, the ML certificate 
property ensures that that vertex corresponds to the ML codeword. When the optimum vertex 
is non- integer (a "pseudocodeword"), one is motivated to tighten the relaxation to eliminate the 
problematic pseudocodeword, and try again. Various methods for tightening LP relaxations have 
been proposed [50] [12] [37] . In some settings one can regularly attain ML performance with few 
additional constraints [55] . 

In this paper, we produce a fast decomposition algorithm based on the Alternating Direction 
Method of Multipliers [5] (ADMM). This is a classic technique in convex optimization and has 
gained a good deal of popularity lately for solving problems in compressed sensing [1] and MAP 
inference in graphical models [29] . As we describe below, when we apply the ADMM algorithm to 
LP decoding, the algorithm is a message passing algorithm that bears a striking similarity to belief 
propagation. Variable update their estimates of the true variable assignment based on information 
(messages) from parity check and measurement nodes. The parity check nodes produce estimated 
assignments of local variables based on information from the variable nodes. 

To an optimization researcher, our application of ADMM would appear quite straight forward. 
However, our second contribution, beyond a naive implementation of ADMM, is a very efficient 
computation of the estimates at the parity checks. Each check update requires the computation 
of a Euclidean projection onto the aforementioned parity polytope. In Section HJ we demonstrate 
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that this projection can be computed in log-linear time in the degree of the check. This in turn 
enables us to write LP solvers with wall-clock speeds comparable to (and sometimes much faster) 
belief propagation decoders. 

The structure of the decoding LP has been examined before in pursuit of efficient implementa- 
tion. The first attempt was by Vontobel and Koetter [43 , 44J where the authors used the coordinate- 
ascent method to develop distributed message-passing type algorithms to solve the LP. There is 
some delicacy in attaining convergence. But, when their approach is matched with an appropriate 
message-passing schedule, as determined by Burshtein in (BJITj, converge to the optimal solution can 
be attained. Further, interior-point [ID] [45J [46J [36] and revised-simplex [24J approaches have also 
been applied. In a separate approach Yedida et al in |55j introduced "Difference-Map BP" decoding 
which is a simple distributed algorithm that seems to recover the performance of LP decoding, but 
does not have convergence guarantees. 

In this paper we frame the LP decoding problem in the template of ADMM. ADMM is dis- 
tributed, has strong convergence guarantees, simple scheduling, and, in general, has been observed 
to be more robust than coordinate ascent. In addition, we do not have to update parameters 
(step length) between iterations in ADMM. In Section [2] we introduce the LP decoding problem 
and introduce notation. We set up the general formulation of ADMM problems in Section [3] and 
specialize the formulation to the LP decoding problem. In Section 0] we present our main technical 
contributions wherein we develop the efficient projection algorithm required. We present numerical 
results in Section [5] and make some final remarks in Section [6l 

2 Background 

In this paper we consider a binary linear LDPC code C of length N defined byaMxJV parity-check 
matrix H. Each of the M parity checks, indexed by J = {1,2,..., M}, corresponds to a row in 
the parity check matrix H. Codeword symbols are indexed by the set X = {1,2, . . . , N}. The 
neighborhood of a check j, denoted by A/" c (j), is the set of indices i £l that participate in the jth 
parity check, i.e., A/" c (j) = {i \ = 1}. Similarly for a component i £ I, J\f v (i) = {j | Hj,% = 1}- 
Given a vector x G {0, 1}^, the jth parity-check is said to be satisfied if Ylietfjj) x i ^ s even. In 
other words, the set of values assigned to the Xi for i G A/" c (j) have even parity. We say that a 
length-n binary vector a; is a codeword, x G C, if and only if (iff) all parity checks are satisfied. 
In a regular LDPC code there is a fixed constant d, such that for all checks j G J , |A/" c (j)| = d. 
Also for all components i £ I, \M v (i)\ is a fixed constant. For simplicity of exposition we focus 
our discussion on regular LDPC codes. Our techniques and results extend immediately to general 
LDPC codes and to high density parity check codes as well. 

To denote compactly the subset of coordinates of x that participate in the jth check we introduce 
the matrix Pj. The matrix Pj is the binary d x N matrix that selects out the d components 
of x that participate in the jth check. For example, say the neighborhood of the jth check, 
A/" c (j) = {ii, i%, . . . id}, where i\ < 12 < ■ ■ ■ < id- Then, for all k £ [d] the (k, ifc)th entry of Pj is one 
and the remaining entries are zero. For any codeword x G C and for any j, PjX is an even parity 
vector of dimension d. In other words we say that PjX G for all j € J" (a "local codeword" 
constraint) where is defined as 

P d = { e G {0, l} d j ||e||i is even}. (2.1) 

Thus, Prf is the set of codewords (the codebook) of the length-d single parity-check code. 
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We begin by describing maximum likelihood (ML) decoding and the LP relaxation proposed by 
Feldman et al. Say vector x is received over a discrete memoryless channel described by channel 
law (conditional probability) W : X x X — > M>o, YlxeX W(x\x) = 1 for all x E X. Since the devel- 
opment is for binary codes \X\ = 2. There is no restriction on X. Maximum likelihood decoding 
selects a codeword x E C that maximizes p x ^ x (x\x), the probability that x was received given 
that x was sent. For discrete memoryless channel W, p x ^ x (x\x) = W ieX W(xi\xi). Equivalently, 
we select a codeword that maximizes X^gz^S W(xi\xi). Let 7« be the negative log-likelihood ratio, 
ji := log [H / (xj|0)/VF(xj|l)]. Since log W{xi\x,i) = —~/iXi + log H^XilO), ML decoding reduces to 
determining an x £ C that minimizes *y T x = Yliex^ iXi - Thus, ML decoding requires minimizing a 
linear function over the set of codewords 

Feldman et al. [15] show that ML decoding is equivalent to minimizing a linear cost over the 
convex hull of all codewords. In other words, minimize -y T x subject to x £ conv(C). The feasible 
region of this program is termed the "codeword" polytope. However, this polytope cannot be 
described tractably. Feldman's approach is first to relax each local codeword constraint PjX E F d 
to PjX E W d where 

¥F d = conv(P d ) = conv({e E {0, l} d | ||e||i is even}). (2.2) 

The object PP^ is called the "parity polytope" . It is the codeword polytope of the single parity- 
check code (of dimension d). Thus, for any codeword x E C, PjX is a vertex of PP^ for all j. 
When the constraints PjX E PP^ are intersected for all j E J the resulting feasible space is termed 
the "fundamental" polytope. Putting these ingredients together yields the LP relaxation that we 
study: 

minimize -y T x s.t. PjX E FF d V j ej. (2.3) 

The statement of the optimization problem in (|2.3p makes it apparent that compact represen- 
tation of the parity polytope PP^ is crucial for efficient solution of the LP. Study of this polytope 
dates back some decades. In [21] Jeroslow gives an explicit representation of the parity polytope 
and shows that it has an exponential number of vertices and facets in d. Later, in [51] , Yannakakis 
shows that the parity polytope has small lift, meaning that it is the projection of a polynomially 
faceted polytope in a dimension polynomial in d. Indeed, Yannakakis' representation requires a 
quadratic number of variables and inequalities. This is one of the descriptions discussed in [15] to 
state the LP decoding problem. 

Yannakakis' representation of a vector u E PP^ consists of variables fi s E [0, 1] for all even s < d. 
Variable fj, s indicates the contribution of binary (zero/one) vectors of Hamming weight sto it. Since 
u is a convex combination of even- weight binary vectors, Yltven s A 4 * = ^ n addition, variables 
Zi >s are used to indicate the contribution to Ui, the ith coordinate of u made by binary vectors of 
Hamming weight s. Overall, the following set of inequalities over 0(cP) variables characterize the 

1 This derivation applies to all binary- input DMCs. In the simulations of Section [5] we focus on the binary-input 
additive white Gaussian noise (AWGN) channel. To help make the definitions more tangible we now summarize how 
they specialize for the binary symmetric channel (BSC) with crossover probability p. For the BSC Xi G {0, 1}. If 
Xi = 1 then 7l = log[W(l|0)/W(l|l)] = log[p/(l - p)] and if Xi = then 7l = log[W(0|Q)/W(Q|l)] = log[(l - p)/p\. 
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parity polytope (see [51] and [15] for a proof). 
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This LP can be solved with standard solvers in polynomial time. However, the quadratic size of 
the LP prohibits its solution with standard solvers in real-time or embedded decoding applications. 
In Section H2] we show that any vector u G PP^ can always be expressed as a convex combination of 
binary vectors of Hamming weight r and r + 2 for some even integer r. Based on this observation we 
develop a new formulation for the parity polytope that consists of 0(d) variables and constraints. 
This is a key step towards the development of an efficient decoding algorithm. Its smaller description 
complexity also makes our formulation particularly well suited for high-density codes whose study 
we leave for future work. 

3 Decoupled relaxation and optimization algorithms 

In this section we present the ADMM formulation of the LP decoding problem and summarize 
our contributions. In Section 13.11 we introduce the general ADMM template. We specialize the 
template to our problem in Section 13.21 We state the algorithm in Section 13.31 and frame it in the 
language of message-passing in Section 13.41 

3.1 ADMM formulation 

To make the LP (|2.3p fit into the ADMM template we relax x to lie in the hypercube, x G [0, 1]^, 
and add the auxiliary "replica" variables zj G M. d for all j G J . We work with a decoupled 
parameterization of the decoding LP. 



The alternating direction method of multiplies works with an augmented Lagrangian which, for 
this problem, is 



minimize 7 x 
subject to Pjx = Zj V j G J 
Zj G PP d V j G J 
x G [0,1]*. 



(3.1) 





(3.2) 
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Here Xj E M d for j E J are the Lagrange multipliers and /i > is a fixed penalty parameter. 
We use A and z to succinctly represent the collection of Ays and ZjS respectively. Note that the 
augmented Lagrangian is obtained by adding the two-norm term of the residual to the ordinary 
Lagrangian. The Lagrangian without the augmentation can be optimized via a dual subgradient 
ascent method [4], but our experiments with this approach required far too many message passing 
iterations for practical implementation. The augmented Lagrangian smoothes the dual problem 
leading to much faster convergence rates in practice |31j . For the interested reader, we provide a 
discussion of the standard dual ascent method in the appendix. 

Let X and Z denote the feasible regions for variables x and z respectively: X = [0, 1]^ and we 
use z E Z to mean that z\ x z 2 x . . . x z\j\ E FF d x FF d x . . . x FF d , the l^-fold product of FF d . 
Then we can succinctly write the iterations of ADMM as 

x k+1 := axgmin xeX L fl (x,z k ,X k ) 

z k+l := argmin ze2 L^(x k+1 ,z, X k ) 

AJ+ 1 :=X k + l ji{P 3 x k+l -z k+l ). 

The ADMM update steps involve fixing one variable and minimizing the other. In particular, x k 
and z k are the fcth iterate and the updates to the x and z variable are performed in an alternating 
fashion. We use this framework to solve the LP relaxation proposed by Feldman et al. and hence 
develop a distributed decoding algorithm. 

3.2 ADMM Update Steps 

The a;-update corresponds to fixing z and A (obtained from the previous iteration or initialization) 
and minimizing L^{x,z, A) subject to x E [0,1]^. Taking the gradient of (13.2p . setting the result 
to zero, and limiting the result to the hypercube X = [0, 1]^, the :c-update simplifies to 




where P = ^jPjPj and Ilp ) i]jv(-) corresponds to projecting onto the hypercube [0,1]^. The 
latter can easily be accomplished by independently projecting the components onto [0, 1]: setting 
the components that are greater than 1 equal to 1, the components less than equal to 0, and 
leaving the remaining coordinates unchanged. Note that for any j, Pj Pj is a N x N diagonal binary 
matrix with non-zero entries at (i, i) if and only if i participates in the jth. parity check (i E A/" c (j)). 
This implies that J^jPj^Pj ^ s a diagonal matrix with the (i,i)th entry equal to |A/"„(i)|. Hence 
P 1 = Pj i s a diagonal matrix with l/\N v (i)\ as the ith. diagonal entry. 

Component-wise, the update rule corresponds to taking the average of the corresponding replica 
values, Zj, adjusted by the the scaled dual variable, Xj//J,, and taking a step in the negative log- 
likelihood direction. For any j E M v (i) let Zj denote the component of Zj that corresponds to 
the ith component of x, in other words the ith component of Pj z.\. Similarly let be the ith 

J J 

component of PjXj. With this notation the update rule for the ith component of x is 
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Each variable update can be done in parallel. 

The z-update corresponds to fixing x and A and minimizing L„(a;,A, z) subject to Zj G PP^ 
for all j G J . The relevant observation here is that the augmented Lagrangian is separable with 
respect to the zjs and hence the minimization step can be decomposed (or "factored") into \J\ 
separate problems, each of which be solved independently. This decouples the overall problem, 
making the approach scalable. 

We start from (|3.2p and concentrate on the terms that involve Zj. For each j G J the update 
is to find the Zj that minimizes 

^\\PjX - Zj\\l - \j Zj s.t. zj G PP d . 

Since the values of a; and A are fixed, so are PjX and A,//i. Setting v = PjX+Xj/fi and completing 
the square we get that the desired update z* is 

z] = argmin £ePPd \\v - z\\%. 
The z-update thus corresponds to |J7| projections onto the parity polytope. 

3.3 ADMM Decoding Algorithm 

The complete ADMM-based algorithm is specified in the Algorithm [1] box. We declare convergence 
when the replicas differ from the x variables by less than some tolerance e > 0. 

Algorithm 1 Given a binary iV-dimensional vector x G {0,1}^, parity check matrix H, and 
parameters [i and e, solve the decoding LP specified in Q3. 1 j> 

1: Construct the negative log-likelihood vector 7 based on received word x. 

2: Construct the d x N matrix Pj for all j G J . 

3: Initialize Zj and \j as the all zeros vector for all j G J . 

4: repeat 

5: Update ^^n [ o,i](p^(E i6 iV4o(4 ) -M A ? ) )-^ i )) for a11 * G X - 
6: for all j G J do 

7: Set Vj = PjX + \j/fl. 

8: Update Zj <— Hw d (vj) where Ilpp d (-) means project onto the parity polytope. 
9: Update Xj -s— Xj + fi (PjX — Zj). 
10: end for 

11: until maxj \\PjX — ZjW^ < e return x. 



3.4 ADMM Decoding as Message Passing Algorithm 

We now present a message-passing interpretation of the ADMM decoding algorithm, Algorithm 
We establish this interpretation using the "normal" factor graph representation [16] (sometimes also 
called "Forney-style" factor graphs) . One key difference between normal factor graphs and ordinary 
factor graphs is that the variables in normal factor graph representation are associated with the 

2 See [53] for another recent interpretation of ADMM as a message-passing algorithm. 
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edges of a regular factor graphs [23], and the constraints of the normal graph representation are 
associated with both factor and variable nodes of the regular representation. Sec [16,25] for details. 
In representing the ADMM algorithm as a message-passing algorithm the x and the replicas z are 
the variables in the normal graph. 

We denote by Xij(k) the replica associated with the edge joining node i £ I and node j £ J, 
where k indicates the feth iteration. Note that Xij 1 (k) = Xij 2 (k) = x\ for all £ J , where x\ is 
the value of at feth iteration in Algorithm [1] The "message" irii->j(k) := Xij(k) is passed from 
node i to node j at the beginning of the kth iteration. Incoming messages to check node j are 
denoted as m^j(k) := {m,i->j(k) : i G M c (j)}. The Zj can also be interpreted as the messages passed 
from check node j to the variable nodes in Af c {j), denoted as m^ffc) := {mj^i{k) : i £ A/" c (j)}. 
Let A^- := Xj/fi and A^ := A^- /fi. Then, for all j G M v (i) 

rrn^jik + 1) = 

The z-update can be rewritten as 

mj^(k + 1) = n PPd (m^j(k) + Xj(k)) . 

The X'j updated is 

X'jik + 1) = X'jik) + (m^jik) - mj->(k)) . 

With this interpretation, it is clear that the ADMM algorithm decouples the decoding problem and 
can be performed in a distributed manner. 

4 The geometric structure of PP^, and efficient projection onto 

In this section we develop our efficient projection algorithm. Recall that = {e £ {0, l} d \ ||e||i is even} 
and that PP^ = conv(P^). Generically we say that a point v E PP^ if and only if there exist a 
set of ej G P^ such that v = Y^i a i e i where ^ a% = 1 and an > 0. In contrast to this generic 
representation, the initial objective of this section is to develop a novel "two-slice" representation of 
any point v € PPfj: namely that any such vector can be written as a convex combination of vectors 
with Hamming weight r and r + 2 for some even integer r. We will then use this representation to 
construct an efficient projection. 

We open the section in Section 14.11 by describing the structured geometry of PP^ that we 
leverage, and laying out the results that will follow in ensuing sections. In Section 14.21 we prove 
a few necessary lemmas illustrating some of the symmetry structure of the parity polytope. In 
Section 14.31 we develop the two-slice representation and connect the ^i-norm of the projection of 
any onto PP^ to the (easily computed) "constituent parity" of the projection of v onto the 

unit hypercube. In Section [4.41 we present the projection algorithm. 

4.1 Introduction to the geometry of PP^ 

In this section we discuss the geometry of PP^. We develop intuition and foreshadow the results to 
come. We start by making a few observations about PP^. 
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• First, we can classify the vertices of PP^ by their weight. We do this by defining ¥ d , the 
constant- weight analog of P^, to be the set of weight-r vertices of PP^: 

P5 = {e€{0,l} d |||c||i=r}, (4.1) 

i.e., the constant-weight-r subcode of P^. Since all elements of P^ are in some P d for some 
even r, P^ = Ua< r <d:r evened- This gives us a new way to think about characterizing the 
parity polytope, 

PF d = conv(Uo< r <d. rev «i P5)- 

• Second, we define PP^ to be the convex hull of K, 

PPrf = conv(P^) = conv({eG {0, l} d | ||c||i=r}). (4.2) 

This object is a "permutahedron" , so termed because it is the convex hull of all permutations 
of a single vector; in this case a length-d binary vector with r ones. Of course, 

PP d = conv(U < r < d:rcvcn PP^). 

• Third, define the affine hyper-plane consisting of all vectors whose components sum to r as 

^ = {x€l d |l T x = r} 

where 1 is the length-<i all-ones vector. We can visualize PP^ as a "slice" through the the 
parity polytope defined as the intersection of T~L d with PP<j. In other words, a definition of 
PP^ equivalent to (JOJ is 

PP^ = PP d n U% 

for r an even integer. 

• Finally, we note that the PP^ are all parallel. This follows since all vectors lying in any of 
these permutahedra are orthogonal to 1. We can think of the line segment that connects the 
origin to 1 as the major axis of the parity polytope with each "slice" orthogonal to the axis. 

The above observations regarding the geometry of PP^ are illustrated in Fig. [TJ Our development 
will be as follows. First, in Sec. 14.21 we draw on a theorem from [28J about the geometry of 
permutahedra to assert that a point v G M. d is in PPJJ if and only if a sorted version of v is 
majorized (see Definition 14.3ft by the length-d vector consisting of r ones followed by d — r zeros 
(the sorted version of any vertex of PPJj). This allows us to characterize the PP^ easily. 

Second, we rewrite any point u G PPd as, per our second bullet above, a convex combination 
of points in slices of different weights r. In other words u = ^o<r<d-reven a r-u r where u r G PP^ 
and the a r are the convex weightings. We develop a useful characterization of PP^, the "two- 
slice" Lemma 14.61 that shows that two slices always suffices. In other words we can always write 
u = au r + (l-a)u r+ 2 where u r G PP^, u r+2 G PP^ +2 , < a < 1, and r = [|| U 1 1 J even ; where even 
is the largest even integer less than or equal to a. We term the lower weight, r, the "constituent" 
parity of the vector. 

Third, in Sec. 14.31 we show that given a point v G M rf that we wish to project onto PP^, it is 
easy to identify the constituent parity of the projection. To express this formally, let Uw d (y) be 
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(01111) (10111) 




Figure 1: The parity polytope PP^ can be expressed as the convex hull of "slices" through 
PPd, each of which contains all weight-r vertices. These sets, FP^ are permutahedra. They are 
all orthogonal to the line segment connecting the origin to the all-ones vector. The geometry 
is sketched for d = 5. 

the projection of v onto PP^- Then, our statement is that we can easily find the even integer r 
such that npp d (v) can be expressed as a convex combination of vectors in PP^ and PP^ +2 . 

Finally, in Sec. 14.41 we develop our projection algorithm. Roughly, our approach is as follows. 
Given a vector v 6 M. d we first compute r, the constituent parity of its projection. Given the 
two-slice representation, projecting onto PF^ is equivalent to determining an a £ [0,1], a vector 
a £ PP r d , and a vector b E PP^ 2 such that the £ 2 norm of v — aa — (1 — a)b is minimized. 

In [5] we showed that, given a, this projection can be accomplished in two steps. We first 
project v onto aPF^ = {x € M. d \0 < Xi < a,]CiLi x « = ar } a scaled version of PP^, scaled by 
the convex weighting parameter. Then we project the residual onto (1 — a)PF^. The object aPP^ 
is an l\ ball with box constraints. Projection onto aPP^ can be done efficiently using a type of 
waterfilling. Since the function min aePp r 5 € ppr+2 \\v — aa — (1 — a)6[|| is convex in a we can perform 
perform a one-dimensional line search (using, for example, the secant method) to determine the 
optimal value for a and thence the desired projection. 

In contrast to the original approach, in Section ^. 4l we develop a far more efficient algorithm that 
avoids the pair of projections and the search for a. In particular, taking advantage of the convexity 
in a we use majorization to characterize the convex hull of PF^ and PF^ +2 in terms of a few linear 
constraints (inequalities). As projecting onto the parity polytope is equivalent to projecting onto 
the convex hull of the two slices, we use the characterization to express the projection problem as 
a quadratic program, and develop an efficient method that directly solves the quadratic program. 
Avoiding the search over a yields a considerable speed-up over the original approach taken in [3]. 
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4.2 Permutation Invariance of the Parity Polytope and Its Consequences 

Let us first describe some of the essential features of the parity polytope that are critical to the 
development of our efficient projection algorithm. First, note the following 

Proposition 4.1 u £ PP^ if and only ifSu is in the parity polytope for every permutation matrix 
S. 

This proposition follows immediately because the vertex set P^ is invariant under permutations 
of the coordinate axes. 

Since we will be primarily concerned with projections onto the parity polytope, let us consider 
the optimization problem 

minimize z ||tJ — z\\2 subject to z € PP^ . (4.3) 

The optimal z* of this problem is the Euclidean projection of v onto PP^, which we denote by 
z* = Hpp d (v). Again using the symmetric nature of PP^, we can show the useful fact that if v is 
sorted in descending order, then so is Ilp^ d (v). 

Proposition 4.2 Given a vector v S M. d , the component-wise ordering of Hf>p d (v) is same as that 
ofv. 

Proof We prove the claim by contradiction. Write z* = Hwp d (v) and suppose that for indices i 
and j we have Vi > Vj but z* < z*. Since all permutations of z* are in the parity polytope, we can 
swap components i and j of z* to obtain another vector in PP^. Under the assumption Zj > z* and 
Vi — Vj > we have z*(vi — Vj) > z*(vi — Vj). This inequality implies that (v, — z*) 2 + (vj — z*) 2 > 
(vi — z*) 2 + (vj — z*) 2 , and hence we get that the Euclidean distance between v and z* is greater 
than the Euclidean distance between v and the vector obtained by swapping the components. ■ 

These two propositions allow us assume through the remainder of this section that our vectors 
are presented sorted in descending order unless explicitly stated otherwise. 

The permutation invariance of the parity polytope also lets us also employ powerful tools 
from the theory of majorization to simplify membership testing and projection. The fundamental 
theorem we exploit is based on the following definition. 

Definition 4.3 Let u and w be d-vectors sorted in decreasing order. The vector w majorizes u if 

q q 

< ^ Wfc V 1 < q < d, 
k=l k=X 
d d 

k=l k=l 

Our results rely on the following Theorem, which states that a vector lies in the convex hull of 
all permutations of another vector if and only if the former is majorized by the latter (see [28] and 
references therein). 

Theorem 4.4 Suppose u and w are d-vectors sorted in decreasing order. Then u is in the convex 
hull of all permutations of w if and only if w majorizes u. 
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To gain intuition for why this theorem might hold, suppose that u is in the convex hull of all 
of the permutations of w. Then u = Y2i=iPi^i w with Sj being permutation matrices, pi > 0, and 
l T p = 1. The matrix Q = ^ILiP*^* ^ s doubly stochastic, and one can immediately check that if 
u = Qw and Q is doubly stochastic, then w majorizes u. 

To apply majorization theory to the parity polytope, begin with one of the permutahedra PP^. 
We recall that ¥¥ d is equal to the convex hull of all binary vectors with weight s, equivalently the 
convex hull of all permutations of the vector consisting of s ones followed by d — s zeros. Thus, by 
Theorem H31 u G [0, l) d is in PP^ if and only if 

q 

^2 u k < min(g, s) V 1 < q < d, (4.4) 

k=l 
d 

^2u k = s. (4.5) 

k=l 

The parity polytope PP^ is simply the convex hull of all of the ¥¥ d with s even. Thus, we 
can use majorization to provide an alternative characterization of the parity polytope to that of 
Yannakakis or Jeroslow. 

Lemma 4.5 A sorted vector u 6 PP^ if and only if there exist non-negative coefficients {fi s }even s<d 
such that 

d 

M« = 1, Hs> 0- (4.6) 

s even 

q d 

Ufc < Ms min((7, s) V 1 < q < d (4-7) 

k=l s even 

d d 

k=l s even 



Proof First, note that every vertex of PP^ of weight s satisfies these inequalities with fj, s = 1 and 
/V = for s' ^ s. Thus u £ FF d must satisfy (gj^-gji]). Conversely, if u satisfies ^U^-^M,, then 
u is majorized by the vector 

d 

w = ^ fi s b s 

s even 

where b s is a vector consisting of s ones followed by d — s zeros, w is contained in PP^ as are all 
of its permutations. Thus, we conclude that u is also contained in PP^. ■ 

While Lemma 14.51 characterizes the containment of a vector in PP^, the relationship is not one- 
to-one; for a particular u £ PP^ there can be many sets {/U s } that satisfy the lemma. We will next 
show that there is always one assignment of fi s with only two non-zero fi s . 
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4.3 Constituent Parity of the Projection 

For a £ R, let [ajeven denote the "even-floor" of a, i.e., the largest even integer r such that r < a. 
Define the "even-ceiling," |~a] even similarly. For a vector u we term UMIiJeven the constituent parity 
of vector u. In this section we will show that if u £ PP^ has constituent parity r, then it can be 
written as a convex combination of binary vectors with weight equal to r and r + 2. This result is 
summarized by the following 



Lemma 4.6 ("Two-slice" lemma) A vector u £ PP^ iffu can be expressed as a convex combination 
r d and PP^ 4 



of vectors in W r d and PP r , +2 where r = [\\u\\i 



Proof Consider any (sorted) u £ PP^. Lemma 14.51 tells us that there is always (at least one) set 
{l^s} that satisfy (I4.6j) - (l4.8j) . Letting r be defined as in the lemma statement, we define a to be 
the unique scalar between zero and one that satisfies the relation ||w||i = ar + (1 — a)(r + 2): 

2 + r — l|tt||i , , . 

a = ^Uli. (4.9) 

Then, we choose the following candidate assignment: /j, r = a, fi r +2 = 1 — a, and all other /i s = 0. 
We show that this choice satisfies (|4.6p - (|4.8p which will in turn imply that there is a u r £ PP^J and 
a u r+ 2 £ PP^ +2 such that u = au r + (1 — a)u r+ 2- 

First, by the definition of a, (|4.6p and (|4.8p are both satisfied. Further, for the candidate set 
the relations (|4.7p and (14. 8p simplify to 



< a min(g, r) + (l— a) min(g, r+2), V l<g<d, (4-10) 

k=l 

d 

^2u k = ar+(l-a)(r + 2). (4.11) 

k=l 

To show that (|4.10p is satisfied is straightforward for the cases q < r and q > r+2. First consider any 
q < r. Since min(g, r) = mm(q, r + 2) = q, < 1 for all k, and there are only q terms, (I4.10P must 
hold. Second, consider any q > r+2. We use (|4.1ip to write Ylt=i u k = or+(l — a)(r+2) — J2q+i u k- 
Since u k > this is upper bounded by ar + (1 — a)(r + 2) which we recognize as the right-hand 
side of (|4.10p since r = mm(q, r) and r + 2 = min(g, r + 2). 

It remains to verify only one more inequality in (|4.10p namely the case when q = r+1, which is 



r+1 

Uk < ar + (1 - a)(r + 1) = 

k=i 



r + 1 — a. 



To show that the above inequality holds, we maximize the right-hand-side of (|4.7p across all valid 
choices of {/i s } and show that the resulting maximum is exactly r + 1 — a. Since this maximum is 
attainable by some choice of {/J> s } and our choice meets that bound, our choice is a valid choice. 

The logic is as follows. Since u £ PP^ any valid choice for {/J, s } must satisfy (I4.6P which, for 
q = r + 1, is 

r+1 d 

^u k < M s min(s,r + 1). (4.12) 

fc=l s even 
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To see that across all valid choice of {p s } the largest value attainable for the right hand side is 
precisely r + 1 — a consider the linear program 

maximize £) s even Ps min(s, r + 1) 

Subject tO J2s even Ms = 1 

E s even Ms* = <W + (1 - tt)(r + 2) 
> 0. 

The first two constraints are simply fj4.6j) and (|4.8p . Recognizing ar + (1 — a)(r + 2) = r + 2 — 2a, 
the dual program is 

minimize (r + 2 — 2a) Ai + A2 

subject to X\s + A2 > min(s, r + 1) V s even. 

Setting /i r = a, p r +2 = (1 — 01), the other primal variable to zero, Ai = 1/2, and A2 = r/2, satis- 
fies the Karush-Kuhn- Tucker (KKT) conditions for this primal/dual pair of LPs. The associated 
optimal cost is r + 1 — a. Thus, the right hand side of (|4.12p is at most r + 1 — a. 

We have proved that if u G PPrf then the choice of r = UMIiJeven and a as in (14.9P satisfies the 
requirements of Lemma 14.51 and so we can express u as u = au r + (1 — a)u r+ 2- The converse — 
given a vector u that is a convex combination of vectors in PP^ and PP^ + it is in PP^ — holds 
becauseconv(PP^ U PP^ +2 ) C W d . ■ 

A useful consequence of Theorem 14.41 is the following corollary. 

Corollary 4.7 Let u be a vector in [0, l] d . If X)fe=i u k is an even integer then u 6 PP^. 

Proof Let Ylt=i Uk = s - Since u is majorized by a sorted binary vector of weight s then, by 
Theorem 14.41 u £ PPd which, in turn, implies u G PP^. ■ 

We conclude this section by showing that we can easily compute the constituent parity of 
Ilpp d (i;) without explicitly computing the projection of v. 

Lemma 4.8 For any vector v G M. d , let z = H^ ^d(v), the projection of v onto [0, l] d and denote 
by Hpp d (v) the projection of v onto the parity polytope. Then 

[J I % || 1 J even < l|n P p»||i < n|z|| l] even • 

That is, we can compute the constituent parity of the projection of v by projecting v onto [0, l] d 
and computing the even floor. 

Proof Let py = |~||z||i] e ven an d PL = UN 111 J even- We prove the following fact: given any y' G PPrf 
with ||y'||i > pu there exits a vector y G [0, l] d such that ||y||i = pu, V G PPd> and ||u — y\\\ < 
\\v — y'|||. The implication of this fact will be that any vector in the parity polytope with l\ norm 
strictly greater that pu cannot be the projection of v. Similarly we can also show that any vector 
with £1 norm strictly less than pi cannot be the projection on the parity polytope. 

First we construct the vector y based on y' and z. Define the set of "high" values to be the 
coordinates on which y\ is greater than Zi, i.e., H := {i G [d] \ y^ > Z{}. Since by assumption 
II y' ||i > Pu > IN 111 we know that 1%! > 1. Consider the test vector t defined component-wise as 




Zi if i G % , 
y\ otherwise. 
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Note that ||t||i < ||z||i < pu < The vector t differs from y' only in %. Thus, by changing 

(reducing) components of y' in the set T~L we can obtain a vector y such that \\y\\i = pu- hi 
particular there exists a vector y with \\y\\i = pu such that y' i >yi> Zi for i G T~L and yi = y[ for 
i ^ %. Since the l\ norm of y is even and it is in [0, l] d we have by Corollary 14.71 that y G PPrf. 

We next show that for all i G T~L, \vi — yi\ < \vi — y'^. The inequality will be strict for at least 
one i yielding \\v — y\\2 < \\ v ~ 2/' III an< i thereby proving the claim. 

We start by noting that y' G PP^ so y[ G [0, 1] for all i. Hence, if z% < y\ for some i we must 
also have Zi < 1, in which case v i < Zi since z^ is the projection of Vi onto [0, 1]. In summary, Z{ < 1 
iff Vi < 1 and when Zi < 1 then Vi < Zj. Therefore, if y[ > Z{ then z^ > V{. Thus for all i G ^ 
we get y'i > yi > Zi > Vi where the first inequality is strict for at least one i. Since yi = y[ for 
i ^ % this means that |uj — y^j < |Uj — for all i where the inequality is strict for at least one 
value of i. Overall, \\v — < \\v — and both y G PP^ (by construction) and y' G PP^ (by 
assumption). Thus, y' cannot be the projection of v onto PP^- Thus the l\ norm of the projection 
of v, ||ripp tJ (v)||i < pu. A similar argument shows that ||npp d (v) [| i > pl and so ||ilpp d (i7)||i must 
lie in [pl,Pu] ■ 

4.4 Projection Algorithm 

In this section we formulate a quadratic program (Problem PQP) for the projection problem and 
then develop an algorithm (Algorithm [2]) that efficiently solves the quadratic program. 

Given a vector v G M. d , set r = L||Hr 0jl jd (u) ||ijeven- From Lemma 14.81 we know that the con- 
stituent parity of z* := IIpp d (t;) is r. We also know that z* is sorted in descending order if v is. 
Let S be a (d — 1) x d matrix with diagonal entries set to 1, S^i+i = —1 for 1 < i < d — 1, and 
zero everywhere else: 
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The constraint that z* has to be sorted in decreasing order can be stated as Sz* > 0, where is 
the all-zeros vector. 

In addition, Lemma 14.61 implies that z* is a convex combination of vectors of Hamming weight 
r and r + 2. Using inequality (|4.10p we get that a d-vector z G [0, l] d , with 

d 

= ar + {l-a)(r + 2), (4.13) 

i=l 

is a convex combination of vectors of weight r and r + 2 iff it satisfies the following bounds: 

Z(f.) < a min(g, r) + (1 — a) min(g, r+2) V 1 < q < d, (4-14) 

k=l 

where z^ denotes the fcth largest component of z. As we saw in the proof of Lemma 14.51 the 
fact that the components of z are no more than one implies that inequalities (|4.14p are satisfied 
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for all q < r. Also, (|4.13p enforces the inequalities for q > r + 2. Therefore, inequalities in (|4.14p 
for q < r and q > r + 2 are redundant. Note that in addition we can eliminate the variable a by 

solving (|4.13p giving a = 1 + r ip 1 2fc ( see a ^ so Q4.9p ). Therefore, for a sorted vector i>, we can 
write the projection onto PP^ as the optimization problem 

minimize —\\v — z 9 
2 II 112 

subject to < Zi < 1 V i 
Sz>0 

< l + r - ^ fc=1 Zk < 1 (4.15) 
fc=l 

The last two constraints can be simplified as follows. First, constraint (I4.15P simplifies to 
r — Sfe=i z fc — r + ^- Next, defining the vector 

f r = (1,1 1, -1,-1 (4.17) 
r+l d-r-1 

we can rewrite inequality (|4.16p as /jTz < r. Using these simplifications yields the final form of 
our quadratic program: 
Problem PQP: 



■ ■ ■ x n . . 2 

minimize — \\v — z U 

2" 112 

subject to < Zi < 1 V i (4.18) 

Sz > (4.19) 

r<l T z<r + 2 (4.20) 

/J* < r. (4.21) 



The projection algorithm we develop efficiently solves the KKT conditions of PQP. The objective 
function is strongly convex and the constraints are linear. Hence, the KKT conditions are not only 
necessary but also sufficient for optimality. To formulate the KKT conditions, we first construct 
the Lagrangian with dual variables f3, fj,, 7, £, 6, and £: 

£ = 2> - 41 -H r ~ fi z ) ~ a* t ( 1 - z ) - ~i Tz 

- £ (r + 2 - l T z) - C(1 T * - r) - T 5z . 
The KKT conditions are then given by stationarity of the Lagrangian, complementary slackness, 
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and feasibility. 

z = v-Pf r -ti + j- (Z-()1 + S T 0. (4.22) 



o < p 


_L 


f?z - r 


< o 


< n 


_L 


Z<1 




< 7 


_L 


z>0 




o < e 


_L 


Sz>0 




o < i 


_L 


l T z - r 


- 2 < 


o<C 


_L 


l T z - r 


> o. 



A vector z that satisfies (|4,22p and the following orthogonality conditions is equal to the projection 
of v onto PP d . 

To proceed, set /3 max = \ [v r +i — v r +2] and define the parameterized vector 

z(P):=Il [0tl]d (v-Pf r ). (4.23) 

The following lemma implies that the optimizer of PQP, i.e., z* = Hpp d (v), is z(/3 opt ) for some 
Popt G [0,/3 max ]. 

Lemma 4.9 There exists a f3 op t G [0, /3 ma J such that z{f3 op t) satisfies the KKT conditions of the 
quadratic program PQP. 

Proof Note that when /3 > (3 

max we have that z r j r \ < z. r _|_2(/3) and z((3) is ordered differently 
from v and fjz{f3) < r. Consequently z((3) cannot be the projection onto PP^ for (3 > /3 max - At 
the other boundary of the interval, when /3 = we have z(0) = H^ ^d(v). If fjz{{)) = r, then 
z(0) G WFd by Corollary 14.71 But since z(0) is the closest point in [0, l] d to v, it must also be the 
closest point in PP^. 

Assume now that fjz(f$) > r. Taking the directional derivative with respect to /3 increasing, 
we obtain the following: 

dfjz{p) _ T dz(p) 

dp Tr dp 

= ~fr,k 
k: 0< 2fe (/3)<l 

= -\{k I 1 < k < d,0 < z k (p) < 1}\ (4.24) 
< 0. 

proving that fJz{P) is a decreasing function of p. Therefore, by the mean value theorem, there 
exists a /3 opt G [0, /3 max ] such that fj z(p opt ) = r. 

First note that z(p opt ) is feasible for Problem PQP. We need only verify (|4.20|) . Recalling that 
r is defined as r = LI|n[o ) i]d(v)||iJeven, we get the lower bound: 

l T z(P opt ) > fJz{p opt ) = r. 

The components of z{p opt ) are all less than one, so Y^kLi z k(Popt) < r + 1. Combining this with 
the equality fj z{p opt ) = r tells us that Y2k=r+2 z k(Papt) < 1- We therefore find that l T z(/3 opt ) is 
no more than r + 2. 
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To complete the proof, we need only find dual variables to certify the optimality. Setting £, £, 
and 9 to zero, and (i and 7 to the values required to satisfy (|4.22|> provides the necessary assignments 
to satisfy the KKT conditions. ■ 

Lemma 14.91 thus certifies that all we need to do to compute the projection is to compute the 
optimal p. To do so, we use the fact that the function fJz(P) is a piecewise linear function of p. 
For a fixed /3, define the active set to be the indices where z(/3) is strictly between and 1 



A{p) := {k I 1 < k < d,0 < z k (p) < 1} 
Let the clipped set be the indices where z(/3) is equal to 1. 

C(/3) :={k\ l<k<d,z k (P) = l}. 
Let the zero set be the indices where z(/3) is equal to zero 

Z{fi) :={k\ l<k<d,z k (/3) = 0}. 
Note that with these definitions, we have 

P) 



(4.25) 



(4.26) 



(4.27) 



fJz{P) = \C(J3)\ + ^ 
= \C(p)\~p\A(P)\ 



(4.28) 



Our algorithm simply increases beta until the active set changes, keeping track of the sets A(P), 
C(/3), and Z((3). We break the interval [0,/3 max ] into the locations where the active set changes, 
and compute the value of fjz((3) at each of these breakpoints until f^z(f3) < r. At this point, 
we have located the appropriate active set for optimality and can find /3 op t by solving the linear 
equation (I4.28p . 

The breakpoints themselves are easy to find: they are the values of f3 where an index is set 
equal to one or equal to zero. First, define the following sets 



01 

03 



{vi- 1 I 1 <i < r + 1}, 
{vi\l<i<r + l}, 
{-Vi I r + 2 < % < d}, 
{-Vi + 1 I r + 2 < i < d}. 



The sets B\ and B2 concern the r + 1 largest components of v; B3 and the smallest components. 
The set of breakpoints is 



B := 



€ |J Bi 



0</3</3 max ^U{0,^ max }. 



There are thus at most 2d + 2 breakpoints. 

To summarize, our Algorithm [2] sorts the input vector, computes the set of breakpoints, and 
then marches through the breakpoints until it finds a value of Pi E B with fJ'z(Pi) < r. Since we 
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/3i-l f3 op t /5 m ax 

Figure 2: Since there are a finite number of breakpoints (at most 2c? + 2) and the function 
fjz{0) is linear between breakpoints, we can solve for /3 op t in linear time. See (|4.17l) and 
(|4.23|) for definitions of f r and z(j3) respectively. 

will also have fj z(j3i-i) > r, the optimal (3 will lie in [/3j— i, /%] and can be found by solving (I4.28p . 
In the algorithm box for Algorithm [2j b is the largest and a is the smallest index in the active set. 
We use V to denote the sum of the elements in the active set and A the total sum of the vector at 
the current break point. Some of the awkward if statements in the main for loop take care of the 
cases when the input vector has many repeated entries. 

Algorithm [2] requires two sorts (sorting the input vector and sorting the breakpoints) , and then 
an inspection of at most 2d breakpoints. Thus, the total complexity of the algorithm is linear plus 
the time for the two sorts. 

5 Numerical results and implementation 

In this section, we present simulation results for the ADMM decoder and discuss various aspects 
of our implementation. In Section 15.11 we present word-error-rate (WER) results for two LDPC 
codes. In Section 15.21 we discuss how the various parameters choices in ADMM affect decoding 
performance, as measured by error rate and by decoding time. 

5.1 Error-rate performance 

In this section we present WER results for the ADMM decoder. We present simulation results for 
two codes and compare to sum-product BP decoding. We simulate both codes over the AWGN 
channel with binary inputs. The first code is the [2640, 1320] rate-0.5, (3, 6)-regular Margulis LDPC 
code [35] . The second is a [1057, 244] rate-0.77, (3, 13)-regular LDPC code obtained from [26]. This 
code is also studied by Yedidia et al. [55] . We choose both codes as they have been chosen in the 
past to study error-floor performance. 

In Fig. [3] we plot the WER performance of the Margulis code for the ADMM decoder and 
various implementations of sum-product BP decoding. As mentioned, this code has been extensively 
studied in the literature due to its error floor behavior (see, e.g., [BIEUES] ) • Recently it has been 
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Algorithm 2 Given u 6 M. d determine its projection on PP^, z* 

1: Permute u to produce a vector t> whose components are sorted in decreasing order, i.e., v\ > 

V2 > • • • > ^d- Let Q be the corresponding permutation matrix, i.e., i> = Qit. 
2: Compute £ <— Il^ijd^)- 

3: Assign r = LH^HiJcvcn and /3 max = \[z r+1 - z r+2 \. 
4: Define f r as in (^TTj) . 
5: if fJ + iZ < r then 
6: Return z* = z. 
7: end if 

8: Assign £i = {v{ — 1 | 1 < i < r + 1}, 

jCi = {vi I 1 < % < r + 1}, 

£2 = {-Vi \r + 2<i<d}, 

C 2 = {-Vi + 1 \r + 2<i<d}. 
9: Assign the set of breakpoints: 

B := |/3 £ Uf =1 £,- U £j) I < p < /3 max } U {0, /3 max }. 
10: Index the breakpoints in B in a sorted manner to get {Pi}i where f3\ < $2 < • • • < /3|b|- 
11: Initialize a as the smallest index such that < z a < 1. 
12: Initialize b as the largest index such that < £5 < 1. 
13: Initialize sum V = fj z. 
14: for i = 1 to |£>| do 
15: Set /?o <- A- 
16: if & € £1 U £ 2 then 
17: Update a -<— a — 1. 
18: Update U «- V + u a . 
19: else 

20: Update 6^—6+1 
21: Update V <- V - v b . 
22: end if 

23: if i < d and Pi / A+i then 

24: A <- (a — 1) + V — /3 (6 - a + 1) 

25: if A < r then break 

26: else if i = d then 

27: A -<— (a — 1) + y — p (b -o + l) 

28: end if 

29: end for 

30: if A > r then 

31: Compute /3 opt <- j3 - 5=^. 

32: else 

33: fa <- A-l 

34: a <- \{j \vj-P> 1}| 

35: 6^r + 2+|{j I Vj +P>0}\ 

36: ^ESU-^ +2 ^ 

37: Papt <- fe _ a+1 ) 

38: end if 

39: Return z* = Q T U [0tl]d (z - p opt fr)- 
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Figure 3: Word error rate (WER) of the [2540, 1320] "Margulis" LDPC code used on the 
AWGN channel plotted as a function of signal-to- noise ratio (SNR). The WER performance 
of ADMM is compared to that of non-saturating sum-product BP, as well as to results for 
(saturating) sum-product BP from Ryan and Lin [35] and from MacKay and Postol 27 . 

noted 0(9] that the previously observed error floor of this code is, at least partially, a result of 
saturation in the message LLRs passed by the BP decoder. This issue of implementation can be 
greatly mitigated by improving the way large LLRs are handled. Thus, alongside these previous 
results we plot results of our own "non-saturating" sum-product BP implementation, which follows 
the implementation of 0[9], and which matches the results reported therein. In our simulations of 
the ADMM decoder we collect more than 200 errors for all data points other than the highest SNR 
(SNR = 2.8dB), for which we collected 10 errors. 

The first aspect to note is that the LP decoder has a waterfall behavior, but it occurs at a 
slightly higher SNR (about OAdB in this example) than that of sum-product BP. This observation 
is consistent with earlier simulations of LP decoding for long block lengths, e.g., those presented 
in [49^55] . It is worth mentioning that it was show in [52] that fixed points of sum-product BP 
correspond to stationary points of the Bethe approximation of the free energy when the temperature 
parameter T = 1. However, when the temperature parameter in the Bethe approximation is reduced 
to T = minimizing the Bethe free energy is the same as LP decoding, see, e.g., |41| . While the 
objective function in sum-product and LP is thus quite different, both optimization problems are 
subject to the same set of constraints - the "local marginal polytope" (equivalent to the fundamental 
polytope for LP decoding of binary codes, see [41] for details). Since the objective functions are 
different, one should not expect identical performance, as the simulations demonstrate. 

The second aspect to note is that, as in the prior work, we do not observe an error floor in 
LP decoding. Considering decoding of this code using the non-saturating version of sum-product 
we do not observe an abrupt error floor. However, we do see that at WERs of 10~ 8 the waterfall 
of ADMM is continuing to steepen, while that of sum-product BP appears to be dropping at a 
constant slope. In this regime we found that the non-saturating BP decoder is not converging to 
a trapping set, but is rather oscillating, as discussed in [57] [M]. However, as we see in our next 
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Figure 4: Word error rate (WER) of the [1057, 244] LDPC code used on the AWGN channel 
plotted as a function of signal-to- noise ratio (SNR). The WER performance of ADMM is 
compared to that of non-saturating sum-product BP, as well as to an estimated lower-bound 
on ML decoding. 



example, there are codes for which ADMM does not display an error floor while non-saturating 
sum-product BP does. 

Figure [5] presents simulation results for the rate-0.77 length-1057 code. In this simulation, all 
data points are based on more than 200 errors except for the ADMM data at SNR = 5dB, where 
29 errors are observed. In addition we plot an estimated lower bound on maximum likelihood 
(ML) decoding performances. The lower bound is estimated in the following way. In the ADMM 
decoding simulations we round any non-integer solution obtained from the ADMM decoder to 
produce a codeword estimate. If the decoder produces a decoding error, i.e., if the estimate does 
not match the transmitted codeword, we check if the estimate is a valid codeword. If the estimate 
satisfies all the parity checks (and is therefore a codeword) we also compare the probability of 
the estimate given the channel observations with the that of the transmitted codeword given the 
channel observations. If the probability of estimate is greater than that of the transmitted codeword 
we know that an ML decoder would also be in error. All other events are counted as ML successes 
(hence the estimated lower bound on ML performance). In contrast to the Margulis code, Fig. [J] 
shows that for this code the ADMM decoder displays no signs of an error floor, while the BP 
decoder does. Further, ADMM is approaching the ML error lower bound at high SNRs. 

Given the importance of error-floor effects in high reliability applications, and the contrasting 
outcomes of our simulations, we now make some observations. One point demonstrated by these 
experiments, in particular by the simulation of the Margulis code, (and argued in |8,9j) is that 
numerical precision effects can dramatically affect code performance in the high SNR regime. When 
precision is limited, a set of incorrect (flipped) channel symbols connected together in a "trapping 
set" can enforce each others' incorrect beliefs sufficiently so as to outweigh the correct evidence 
from the rest of the code. Even though the rest of the code symbols may be much more certain of 
themselves, particularly if the magnitude of beliefs are limited, those limits can prevent the correct 
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variables from having sufficient influence on the symbols in the trapping set to correct them. From 
a practical point of view, a real-world implementation would use fixed precision arithmetic. Thus, 
understanding decoding behavior under finite precision is extremely important. 

A second point made by comparing the two codes is that the performance of an algorithm, 
e.g., non-saturating BP, can vary dramatically from code to code (Margulis vs. 1057) and the 
performance of a code from algorithm to algorithm (BP vs. ADMM). For each algorithm we might 
think about three types of codes (53]. The first (type-A) would consist of codes that do not have any 
trapping sets, i.e., do not display an error floor, even for low-precision implementations. The second 
(type-B) would consist of codes whose behavior changes with precision (e.g., the Margulis code). 
The final (type-C) would consist of codes that have trapping sets even under infinite precision (the 
length-1057 code may belong to this set). Under this taxonomy there are two natural strategies 
to pursue. The first is to design codes that fall in the first class. This is the approach taken in, 
e.g., [32] [17\ [20] [30] [37|, where codes of large-girth are sought. The second is to design improved 
algorithms that enlarge the set of codes that fall into the first class. This is the approach taken 
in this paper. Since the ADMM decoder has rigorous convergence guarantees, since ADMM has 
historically be observed to be quite robust to parameter choices and precision settings, and since 
the "messages" passed in ADMM (the replica values) are inherently bounded to the unit interval 
(since the parity polytope is contained within the unit hypercube), we expect that the ADMM 
decoder will be a strong competitor to BP in applications that demand ultra-high reliabilities. 

5.2 Parameter choices 

In the ADMM decoding algorithm there are a number of parameters that need to be set. The first 
is the stopping tolerance, e, the second is the penalty parameter, fx, and the third is the maximum 
allowable number of iterations, i max - In our experiments we explored the sensitivity of algorithm 
behavior, in particular word-error-rate and execution-time statistics, as a function of the settings 
of these parameters. In this section we present results that summarize what we learned. We 
report results for the [1057, 244] LDPC code. We note that, in contrast to the simulation results 
for the AWGN channel presented in the last subsection, in this section we report on simulation 
results for the binary symmetric channel (BSC). We assume the BSC results from hard-decision 
demodulation of a BPSK ±1 sequence transmitted over an AWGN channel. The resulting relation 
between the crossover probability p of the equivalent BSC-p and the SNR of the AWGN channel is 
p = Q (^/2R ■ iQSNR/w^j ^ w h ere ji i s the rate of the code and Q(-) is the Q-function. We reported 

on WER performance of ADMM decoding for this code and channel in [3]. 

We first explore the effects of the choice of e and /i on the error rate. We comment that as long 
as i max > 300 the choice of i max does not significantly affect the WER. In Fig. [5] we plot WER as 
a function of the number of bits of stopping tolerance, i.e., — log 2 (e). In Fig. [6] we plot WER as a 
function of \x. Each data point is based on more than 200 decoding errors. 

From these two figures we conclude that the performance of the ADMM decoder depends only 
weakly on the settings of these two parameters, as long as the parameters are chosen sufficiently 
large. For instance e > 10 -4 and [i > 2 should do. This means that the implementer has great 
latitude in the choice of these parameters and can make, e.g., hardware-compatible choices. Fur- 
thermore, the results on ending tolerance give hints as to the needed precision of the algorithm. If 
algorithmic precision is on the order of the needed ending tolerance we expect to observe similar 
error rates. 
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Figure 5: Word error rate (WER) of the [1057, 244] LDPC code for the BSC plotted as a 
function of error tolerance e for three difference penalty parameters fi. The SNR simulated is 
5dB. The maximum number of iterations t max is set equal to 250. 




Figure 6: Word error rate (WER) of the [1057, 244] LDPC code for the BSC plotted as a 
function of penalty parameter \i. Error tolerance e = 10 -4 , and maximum number of iterations 
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Figure 7: Average execution time (in seconds) of ADMM decoding the [1057, 244] code 
simulated over the BSC plotted as a function of penalty parameter /i for three distinct SNRs. 

We next study the effect of parameter section on average decoding time. All time statistics 
were collected on a 2GHz Intel(R) Xeon(R) CPU. In Fig. [7] we plot average decoding time as a 
function of \x for three SNRs. For all three ending tolerance is fixed at e = 10 -4 . Note that based 
on Fig. [6] we should choose [i > 1.5 for best WER performance. We see some weak variability in 
average decoding time as a function of the choice of fj,. 

Now, understanding the various parameters we can tune, we summarize the choices made for 
our simulation results presented in Sec. 15.11 For all simulations we made the following choices: (i) 
error tolerance e = 10~ 5 , (ii) penalty [i = 5, (hi) maximum number of iterations t max = 600 for the 
[2540, 1320] code and i max = 500 for the [1057, 244] code. 

Overrelaxation A significant improvement in average decoding time results from implementing 
an "over-relaxed" version of ADMM. Over-relaxed ADMM is discussed in [5j section 3.4.3] as a 
method for improving convergence speed while retaining convergence guarantees. 

The over-relaxation parameter 7 must be in the range 1 < 7 < 2. If 7 > 2 convergence 
guarantees are lost. We did simulated 7 > 2 and observed an increase in average decoding time. In 
Fig. [8] we plot the effect on average decoding time of over-relaxed versions of the ADMM decoder 
for 1 < 7 < 1.9. These plots are for the length-2640 Margulis code simulated over the AWGN 
channel at an SNR of 2.8dB. We observe that the average decoding time drops by a factor of about 
50% over the range of 7. The improvement is roughly the same for the set of penalty parameters 
studied, n E {1.5,2,2.5,3}. The take-away is that by choosing over-relaxation parameter 7 = 1.9 
we can double decoding efficiency without degradation in error-rate. 

While we did not use overrelaxation in the previously discussed experiments, we would encourage 
interested developers to explore proper settings of 7 in their implementations. 
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Figure 8: Average execution time for ADMM decoding the [2640, 1320] Margulis code sim- 
ulated over the AWGN channel at SNR = 2.8dB. Execution time (in seconds) is plotted as a 
function of over-relaxation parameter 7 for four different penalty parameters \i G {1.5, 2, 2.5, 3}. 

6 Conclusion 

In this paper we apply the ADMM template to the LP decoding problem introduced in [15J. A main 
technical hurdle was the development of an efficient method of projecting a point onto the parity 
polytope. We accomplished this in two steps. We first introduced a new "two-slice" representation 
of points in the parity polytope. We then used the representation to show that the projection via an 
efficient waterfilling-type algorithm. We demonstrate the effectiveness of our decoding technique 
on two codes, on the rate-0.5 [2640, 1320] "Margulis" LDPC code and the rate-0.77 [1057, 244] 
LDPC code studied in |55j . We find that while similar in many aspects there are some significant 
difference between the decoding behavior of LP and sum-product BP decoding. On one hand, the 
waterfall of LP decoding initiates at slightly higher SNR than that of sum-product BP decoding. 
But, on the other, LP decoding does not seem to have an error floor. Fully understanding LP 
decoding performance in this high-SNR regime is an important future direction. What we have 
seen is that LP decoding, when implemented in a distributed, scalable manner using ADMM is 
a strong competitor to BP in the high-SNR regime. It allows LP decoding to be applied to long 
block-length codes, to be implemented as a message-passing algorithm using a very simple message 
update schedule, and to execute as fast as BP. 
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A Dual Subgradient Ascent 

If we work with an (un-augmented) Lagrangian 
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the dual subgradient ascent method consists of the iterations: 



x k+1 := argmin^ L (x, z k , \ k ) 
z k+1 := argmin ze2 L (x k , z,X k ) 

AJ+ 1 := \ k + ^[p ]X k+1 -z k+1 ). 

Note here that the x and z updates are run with respect to the k iterates of the other variables, 
and can be run completely in parallel. 

The ai-update corresponds to solving the very simple LP: 

minimize 17+ >,„•,- -7 jPFA? I x 



v 7 + E^^ t a; 

subject to x £ [0, 1]^. 



This results in the assignment: 



x k+l _ p I DT x fc 




where 

0(t) = 

is the Heaviside function. 

For the z-update, we have to solve the following LP for each j G J: 

maximize A^ T Zj ^ 
subject to zj G PP d . 

Maximizing a linear function over the parity polytope can be performed in linear time. First, 
note that the optimal solution necessarily occurs at a vertex, which is a binary vector with an 
even hamming weight. Let r be the number of positive components in the cost vector A^. If r is 
even, the vector v G PP^ which is equal to 1 where A^ is positive and zero elsewhere is a solution 
of (jA.ip . as making any additional components nonzero decreases the cost as does making any of 
the components equal to 1 smaller. If r is odd, we only need to compare the cost of the vector 
equal to 1 in the r — 1 largest components and zero elsewhere to the cost of the vector equal to 1 
in the r + 1 largest components and equal to zero elsewhere. 

The procedure to solve (jA.ip is summarized in Algorithm [3l Note that finding the smallest 
positive element and largest nonnegative element can be done in linear time. Hence, the complexity 
of Algorithm [3] is 0(d). 

While this subgradient ascent method is quite simple, it is requires vastly more iterations than 
the ADMM method, and thus we did not pursue this any further. 
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Algorithm 3 Given a binary ci-dimensional vector c, maximize c T 2 subject to z € PP^. 

1: Let r be the number of positive elements in c. 
2: if r is even then 

3: Return z* where z* = 1 if q > and z* = otherwise. 
4: else 

5: Find index i p of the smallest positive element of c. 
6: Find index i n of the largest non-positive element of c. 
7: if c ip > c in then 

8: Return z* where z* = 1 if q > 0, z* = 1, and z* = otherwise. 
9: else 

10: Return z* where z* = 1 if q > and % / i p , z* p = 0, and z* = for all other i. 
11: end if 
12: end if 
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